home *** CD-ROM | disk | FTP | other *** search
- Pentium Study
-
- http://www.ibm.com
- IBM study: likelihood of Pentium chip error
-
-
-
-
- IBM Research focused on the likelihood of error on the
- Pentium chip in everyday floating point division activities.
- Intel has analyzed the probability of making an error based
- on the assumption that any possible 64 bit pattern is
- equally likely to occur in both the numerator and the
- denominator. If that were the case, then the chances of the
- error would be 1 in 9 billion. They also estimate that an
- average spreadsheet user will perform 1,000 floating point
- division per day. Based on these assumptions, Intel
- estimates that an error will be encountered only once in 9
- million days (or once in about 27,000 years).
-
- Our analysis shows that the chances of an error occurring
- are significantly greater when we perform simulations of the
- types of calculations performed by financial spreadsheet
- users, because, in this case, all bit patterns are not
- equally probable.
-
- Probability Tests:
-
- We have analyzed a Pentium chip in order to understand the
- sources of errors and have found that in order for an error
- to occur, both the numerator and denominator must have
- certain "bad" bit patterns, as described below.
-
- First, for the denominator to be "at risk", that is, capable
- of producing an error with certain numerators, it must
- contain a long string of consecutive 1's in its binary
- representation. Although such numbers represent only a very
- small fraction of all possible numbers, they do occur much
- more frequently when denominators are created by adding,
- subtracting, multiplying or dividing simple numbers. For
- example, the number 3.0 represented exactly, does not have
- that pattern, but the result of the computation 4.1-1.1 does
- have that pattern.
-
- How many denominators produced in this fashion can be "at
- risk" that is, capable of producing an error for certain
- numerators? When we randomly added or subtracted ten random
- numbers having a single digit dollar amount and two digit in
- cents, for example, $4.57, then one out of every 300 of the
- results was "at risk" and hence capable of producing an
- error. If we repeated the test with numbers having two
- digit dollar amounts and two digits in cents, then one out
- of every 2,000 could cause an error. If the denominator was
- calculated by dividing two numbers having one digit to the
- left and one to the right of the decimal point, then
- approximately one in every 200 could cause an error.
-
- For simplicity, suppose that one of every 1000 denominators
- produced by some calculations was "at risk."
-
- Now, suppose we have created a bad denominator. What is the
- chance of now encountering a bad numerator, which will
- produce an error? It depends on the actual value of the
- "at risk" denominator, but based on our tests, a
- conservative estimate would be that only one out of every
- 100,000 numerators causes a problem.
-
- Finally, when we combine the chances of a bad numerator and
- the chances of a bad denominator, the result is that one out
- of every 100 million divisions will give a bad result. Our
- conclusion is vastly different from Intel's.
-
- Frequency Tests:
-
- We also questioned Intel's analysis and assumption that
- spreadsheet users will only perform 1,000 divides in a day.
- Tests run independently suggest that a spreadsheet user
- (Lotus 1-2-3) does about 5,000 divides every second when he
- is calculating his spreadsheet. Even if he does this for
- only 15 minutes a day, he will perform 4.2 million divides
- in a day, and according to our probability findings, on
- average, a computer could make a mistake every 24 days.
- Hypothetically, if 100,000 Pentium customers were doing 15
- minutes of calculations every day, we could expect 4,000
- mistakes to occur each day.
-
- Conclusion:
-
- The Pentium processor does make errors on floating point
- divisions when both the numerator and denominator of the
- division have certain characteristics. Our study is an
- analysis based on probabilities and chances. In reality, a
- user could face either significantly more errors or no
- errors at all. If an error occurs, it will first appear in
- the fifth or higher significant digit and it may have no
- effect or it may have catastrophic effects.
-
-
-
- Additional Technical Detail:
-
- Some Experiments on Pentium Using Decimal Numbers
-
- According to an Intel white paper, if you were to choose a random
- binary bit pattern for numerator and the denominator, the probability
- of error in divide is about 1 in 9 billion.
-
- The error occurs when certain divisors (termed "at risk" or bad)
- are divided into certain numerators. In order for the error to
- occur, our belief is that divisors must lie in a certain range.
- For each such denominator, there is a range of numerator values
- which produce an incorrect result.
-
- An example of affected numbers is the decimal constants
- we hardwire in our programs. For example, if converting
- from months to years and we are interested in 7-8 decimal digits
- of accuracy, then we can hard wire a constant to convert from
- months to years.
-
- alpha = 1/12 = .083333333
-
- Let us construct a hypothetical example. We have contracted
- a job which is expected to last 22.5 months. The total
- value of the contract is $96000. From this, tax at the
- rate of 14 and 2/3 percent rate has to be deducted.
- The taxing authority has defined 14 and 2/3 percent to
- be 14.66667. We want to calculate the net take at a per
- annum basis. We do the following calculations.
-
- Tax = 96000*.1466667 = 14080.0032
-
- Net take home money = 96000 - 14080.0032 = 81919.9968
-
- The number of years in 22.5 months = 22.5*.083333333
- = 1.8749999925 years
-
- Net take home money per annum = 81919.9968/1.8749999925
-
- = $43690.6667
-
- Most machines give the above answer which satisfies the
- desired 7-8 digit accuracy criterion. On Pentium,
- the answer is $43690.53, which has only 5 correct digits.
-
- In this example, both numerator and denominator are bad numbers.
- They are both near some simple integer boundary in their binary
- presentation and as you rightly observed, these numbers occur
- in real world at a much higher frequency compared to the
- totally random bit pattern hypothesis.
-
- Probabilistic Analysis
-
- We are addressing the question of how likely it is to have a bad
- divisor. On Pentium, a bad divisor belongs to one of the five
- bad table entries characterized by 1.0001, 1.0100, 1.0111, 1.1010, and
- 1.1101, followed by a string of 1's in the mantissa.
-
- We have found that if the string of 1's is of length 20 or so, then it is
- a bad divisor. Given a bad divisor, the probability of making an error
- in the division increases dramatically, compared to the 1 in 9 billion
- figure quoted by Intel.
-
- We did some simple experiments using decimal numbers and the findings
- are reported below. We counted only those bad divisors
- which belong to one of the above five table values, followed
- by a string of 32 1's. Intel people argue that all binary patterns
- are equally likely. If that was really the case, the probability
- of finding a bad divisor, as defined above, will be 5/(2**36)
- or about one in 13 billion random divisors. However, we are finding
- the probabilities to be much higher.
-
- Addition/Subtraction of Decimal Numbers
-
- In this experiment, we randomly added or subtracted, 10 uniformly distributed
- random numbers having one or two decimal digits (as in dollars and cents)
- and then we examined the result for the above binary patterns.
- Here are the results for two cases. In the
- first case, we chose only one digit to the left of decimal (as in $3.47)
- and in the second case, we chose two digits to the left of the decimal
- (as in $29.56). All the digits were chosen randomly with uniform probability.
- In the third case, we chose one digit to the right of the decimal point
- and two digits to the left.
- The results below give the number of times the result of this
- experiment has the bit pattern corresponding to a bad divisor.
-
- Case 1 (one digit to the left, two to the right) --- 188 out of 100,000
- Case 2 (two digits to the left, two to the right) --- 45 out of 100,000
- Case 3 (two digits to the left, one to the right) --- 356 out of 100,000
-
- Clearly, these probabilities are much higher than those obtained
- with the random bit pattern hypothesis.
-
- Division Of Two Decimal Numbers:
-
- These experiments were conducted through exhaustive tests on
- all possible digits patterns. Here (a.b)/(c.d) represents
- division of a two digit (one to the left of the decimal point
- and one to the right of the decimal point) number by another
- two digit number.
-
- (a.b)/(c.d) - 44 out of 10,000
- (0.ab)/(0.cd) - 27 out of 10,000
- (a.bc)/(d.ef) - 344 out of 1,000,000
- (ab.c)/(de.f) - 422 out of 1,000,000
-
- Multiplication of Two Numbers
-
- Here we are multiplying a decimal number by another number which
- was computed as a reciprocal of another decimal number as in scaling
- by a constant.
-
- (a.b) * (1/(c.d)) - 37 out of 10,000
- (a.bc) * (1/(d.e)) - 139 out of 100,000
- (a.bc) * (1/(d.ef)) - 434 out of 1,000,000
-
- To summarize, for the decimal calculations of the type given above,
- the probability of having a result which falls into the category of
- being a bad divisor is rather high. It appears to be somewhere
- between 1 in 3000 to 1 in 250. Let us say that it is of the order
- of 1 in 1000.
-
- Furthermore, if the rounding mode corresponds to truncate, the
- probability of arriving at bad divisors increases significantly.
-
-
- The Dependency on Numerator
-
- Given a bad divisor, the divide error occurs for some range of values
- of the numerator. If we were to take a totally random bit
- pattern for the denominator, the probability of error
- appears to be of the order one in 100,000. This is a first cut
- rough estimate and probably could be improved. It appears that
- probabilities are different for different table values. The
- table corresponding to '1.0001' seems to have the most error.
- For numerator also, there are
- bands of values where the error is much more likely. Again
- these bands are more prominent near whole numbers.
- For example. if we were using (19.4 - 10.4) = '9' as a divisor
- (a bad one), and you picked a random value between 6 and 6.01,
- as the numerator then the chance of error increases to about
- one in 1000.
-
- For the purpose of our simplistic analysis, we will use the
- figure of 1 in 100,000 for a bad numerator. This assumes that
- we are picking up a random numerator. Using the value of 1
- in 1000 as the probability for a bad divisor, the overall
- probability for a 'typical' divide being incorrect seems to
- be of the order of 1 in 100 millions. This is about two orders of
- magnitude higher compared to the Intel estimate of 1 in 9 billion.
-
-
-
- Probability of a Divide Instruction
-
- Let us assume that a Pentium operating at 90 MHz does an op
- in 1.2 cycles on the average. That will give about 75 Million ops
- per second of actual compute time. We will use
- a figure of 1 divide per 16,000 instructions, even though many
- estimates suggest a much higher frequency of divide.
-
- Thus using this conservative estimate of one divide per 16,000
- instructions, we come up at about 4687 divides per second.
- Let us further assume that a typical spread-sheet user
- does only about 15 minutes of actual intensive computing
- per day. Then, he is likely to do 4687*900 = 4.2 million
- divides per day. Assuming an error rate of 1 in 100 million,
- it will take about 24 days for an error to occur for an
- individual user.
-
- Combine this with the fact that there are millions of
- PENTIUM users worldwide, we quickly come to the conclusion
- that on a typical day a large number of people are making
- mistakes in their computation without realizing it.
-
-
- IBM Corporation
- ibmstudy@watson.ibm.com
-
-
- <!-- end ibmbody--------------------------------------------------------- -->
-
- <p> <hr>
- <b>
- [
- <a href="http://www.ibm.com/">IBM home page</a> |
- <a href="http://www.ibm.com/Orders/">Order</a> |
- <a href="http://www.austin.ibm.com/search/">Search</a> |
- <a href="http://www.ibm.com/Assist/">Contact IBM</a> |
- <a href="http://www.ibm.com/Finding/">Help</a> |
- <a href="http://www.ibm.com/copyright.html"> Copyright</a> |
- <a href="http://www.ibm.com/trademarks.html">tm</a>
- ]
- </b>
-
- </body>
- </html>
-
-
-